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Abstract:  Computational  flow  imaging  (CFI)  uses  theoretical  predictions  of  the  interaction  and 
transmission  of  optical  waves  through  theoretical  flowfield  to  generate  digital  pictures  that  simu¬ 
late  real  observations,  but  visualizing  three-dimensional  unstructured  data  from  aerodynamics 
calculations  is  challenging  because  the  associated  meshes  are  typically  large  in  size  and  irregular 
in  both  shape  and  resolution.  The  goal  of  this  research  is  to  develop  a  fast,  efficient  parallel  vol¬ 
ume  rendering  algorithm  of  computational  flow  imaging  (corresponding  fo  shadowgraph, 
schlieren  and  inferferomefric)  for  parallel  disfribufed-memory  supercompufers  consisfing  of  a 
large  number  of  powerful  processors.  We  use  a  preprocessing  procedure  thaf  the  whole  flow  field 
is  divided  info  new  regular  regions  in  which  the  voxels  are  distributed  by  x  and  y  coordinates  so 
that  the  ray-casting  and  field  reconsfrucfing  can  provide  maximum  flexibilify  in  fhe  dafa  disfribu- 
fion  and  rendering  sfeps. 
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1.  Introduction 

Compufafional  flow  imaging  (CFI)  is  a  new  approach  fo  flow  visualizafion.  It  uses  theoretical 
predictions  of  the  interaction  and  transmission  of  optical  waves  through  theoretical  flowfield  fo 
generafe  digifal  picfures  fhaf  simulafe  real  observations.  CFI  is  used  fo  consfrucf  flow  visualizafion 
corresponding  fo  shadowgraph,  schlieren,  inferferomefric  and  Planar  Laser  Induced  Fluorescence 
(PLIF)  images.  CFI  provides  a  better  insight  into  the  flow  physics  and  CFD  code  behavior.  It  is 
proving  to  be  extremely  useful  to  experimentally  validate  CFD  codes.  In  essence  it  is  the  art  and 
science  of  generating  digital  images  of  theoretical  fluid  dynamic  phenomena  in  formafs  fhaf  mimic 
optical  observations  of  real  flowfield.  Models  for  the  flow  measuremenf  processes  musf  fherefore 
be  superimposed  on  the  models  used  to  predict  the  flow  behavior. 

There  are  two  main  challenges  in  CFI.  First,  the  image  recorded  is  integrated  across  the 
three-dimensional  flowfield  for  all  CFI  numerical  processes  (excepf  Planar  Laser  Induced  Fluo¬ 
rescence  in  a  fhin  sheef  passed  info  fhe  flow).  Second,  fhe  CFD  calculafion  is  becoming  more 
complicafed  in  bofh  size  and  shape,  so  parallel  rendering  algorifhms  of  CFI  for  large  dafasef 
should  also  be  developed  correspondenfly. 
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Some  of  the  former  authors^’ only  involve  simple  two-dimensional  problems.  The  methods 
of  Tamura"*,  Yates^  and  Lanen^  are  actually  three-dimensional,  however,  for  the  line-of-sight  inte¬ 
gration  scheme,  not  considering  that  there  are  intersecting  or  overlapping  of  multi-grids  CFD  so¬ 
lutions,  they  would  repeat  calculating  some  integral  segments  along  ray  path. 

We  develop  a  parallel  volume  rendering  method  available  for  Computational  Flow  Imaging 
to  translate  curvilinear  multiple  overlapping  grids  CFD  resolutions  into  new  rectilinear 
multi-blocks  data  fields  in  order  to  eliminate  errors  caused  by  intersected  or  overlapped  parts  of 
the  CFD  grids.  A  preprocessing  procedure  is  used  to  divided  the  whole  flow  field  into  new  regular 
regions  in  which  the  voxels  are  distributed  by  x  and  y  coordinates  to  increase  the  searching  speed. 
This  parallel  algorithm  is  implemented  in  a  linux-based  Beowulf  parallel  computing  system. 


2.  Optical  modeling 


Shadowgraphs,  schlieren  and  interferometric  techniques  commonly  used  to  visualize  refrac¬ 
tive  index  or  density  changes  in  flow  fields.  Shadowgraph  sysfems  are  often  used  where  fhe  den- 
sify  gradienfs  are  large.  This  fechnique  also  can  accommodafe  large  subjecfs,  is  relatively  simple 
in  ferms  of  materials  required  and  in  terms  of  cost  is  probably  the  least  expensive  technique  to 
set-up  and  operate.  Schlieren  systems  are  intermediate  in  terms  of  sensitivity,  system  complexity 
and  cost.  Schlieren  systems  deserve  attention  for  its  relative  simplicity  (the  equipment  is  straight¬ 
forward  to  align),  ease  of  application  (little  sensitivity  to  vibrations),  low  cost  (expensive  laser 
sources  are  not  required)  and  satisfactory  accuracy  of  results.  Interferometric  techniques  are  usu¬ 
ally  highly  sensitive,  complex  in  set-up,  can  provide  quantitative  information  but  are  expensive 
systems  and  can  only  deal  with  relatively  small  subjects. 

For  three-dimensional  flow  fields,  in  order  fo  evaluafe  line-of-sighf  infegrafion  fhrough  a  dis- 
crefe  three-dimensional  field,  an  algorithm  has  been  developed  which  calculates  the  values  of  the 
appropriate  integrand  at  certain  points,  after  which  the  integration  is  performed  according  the 
trapezoidal  rule. 

Shadowgraph  systems  respond  to  the  second  derivative  of  flow  densify  where  as  schlieren 
sysfems  respond  fo  the  first  derivative.  If  the  light  path  length  is  divided  into  n  segments,  for 
shadowgraph  systems,  the  distributions  for  the  optical  intensities  are 
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For  schlieren  systems,  if  the  knife-edge  or  color  filter  is  vertical,  the  optical  intensities  are 
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If  the  knife-edge  or  color  filter  is  horizontal,  the  optical  intensities  are 
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For  optical  Mach-Zehnder  or  holographic  interferometer,  the  light  intensity  /  is  proportional 
to  fringe  shift  F  by^ 
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In  order  to  calculate  the  Fringe  shift,  according  to  Mach-Zehnder  or  holographic  interferome¬ 
try,  the  bright  fringes  satisfy  the  equation*: 

2£x+\  [n(x,y,z)  -  nQ]dz  =  iA  (/  =  1,2,3, . )  (2.5) 

Js^ 

where  [n(x, y, z)-no]  is  the  refractive  index  relative  to  some  reference  field  the  path 


integral  is  evaluated  along  the  ray  path  s- ,  £  is  the  difference  of  the  angle  of  two  glass  plates  P\ 


and  P2,  i  is  the  integer  fringe  shift,  X  is  the  wavelength  of  the  light  source. 

When  there  is  no  disturb  in  the  fiowfield  of  the  test  volume,  if  £  =0,  the  beams  reaching  the 
photo  plane  are  with  the  same  phase,  so  there  are  no  reference  fringes  (infinite-fringe);  if  £  >0  or 
£  <0,  the  recombined  beams  will  interfere  and  produce  reference  fringes  (finite-fringe). 

The  reference  fringes  of  Eq.  (2.1)  and  the  following  equations  of  this  section  are  all  vertical. 
To  get  horizontal  reference  fringes,  we  can  use  2£  y  instead  of  term  2£  x  of  these  equations. 
Using  continuous  fringe  shift  F  instead  of  i  as  function  of  n(x,y,  z) ,  we  have 
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The  optical  path  is  along  z.  The  limits  of  integration,  zi  and  Z2  represent  the  length  along  the 
optical  path  length  where  the  index  of  refraction  is  varying. 

The  relation  between  gas  density  p  and  refraction  index  n  is  Gladstone -Dale  equation 
n-\  =  Kp,  (2.7) 

For  gases  of  different  species,  K  is  not  constant.  It  is 
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where  a .  is  the  mass  fraction  and  Ki  is  the  G-D  constant  of  the  specie. 


The  Eq.  (2.6),  Eq.  (2.7)  and  Eq.  (2.8)  constitute  the  theoretical  model  of  optical 
Mach-Zehnder  or  holographic  interferometric  systems  for  flow  density  measurement,  if  the  light 
path  length  is  divided  into  n  segments  in  a  discrete  three-dimensional  field,  then  the  fringe  shift  F 
is 
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3.  Parallel  Volume  Rendering 

CFI  assumes  that  the  considered  domain  contain  a  gas  with  variable  optical  properties  of 
shadowgraph,  schlieren  or  interferometric.  At  the  volume  rendering  the  domain  is  projected  on  the 
screen  in  such  a  way  that  analyzed  function  is  transformed  to  a  distribution  of  intensity  of  light  on 
view  plane.  This  transformation  uses  integral  function  given  by  the  relation  Eq.  (2.1)  ,  Eq.  (2.2)  , 
Eq.  (2.3)  and  Eq.  (2.9). 

Usually  the  calculation  grids  of  CFD  solutions  are  multizone,  curvilinear  or  irregular,  while 
the  ray  path  is  approximated  by  a  straight  lines  perpendicular  to  the  image  plane^.  Volume  render¬ 
ing  multizone,  curvilinear  or  irregular  grids  is  a  hard  operation  and  there  are  several  different  ap- 


proaches  to  this  problem. 

The  simplest  but  most  inefificient  way  is  to  resample  the  irregular  grid  to  a  regular  grid.  In 
order  to  achieve  the  necessary  accuracy,  a  high  enough  sampling  rate  has  to  be  used  what  in  most 
cases  will  make  the  resulting  regular  grid  volume  too  large  for  storage  and  rendering  purposes,  not 
mentioning  the  time  to  perform  the  resampling. 

Yates"^  used  a  method  similar  with  cell  projection.  In  cell  projection,  every  pixel  on  the  image 
plane  should  be  projected  otherwise  not  all  the  segments  in  a  light  path  are  included  in  the  accu¬ 
mulation  of  the  final  results.  The  contribution  of  each  computational  cell  to  the  line  integral  can  be 
found  independently.  The  disadvantage  of  cell  projection  is  that  for  intersected  or  overlapped  grids, 
the  overlapped  parts  of  the  segments  along  the  light  path  will  be  calculated  repeatedly  and  cause 
integral  errors. 

Raytracing  is  expensive  than  projection  because  it  need  to  search  in  the  cells  to  find  the  vox¬ 
els  intersected  by  the  light,  but  it  can  deal  with  any  complex  grids  of  the  CFD  flow  field.  Only 
raytracing  can  avoid  the  integral  error  in  multi-grids  curvilinear  intersecting  or  overlapping  volu¬ 
metric  datesets.  In  ray  tracing,  resampling  process  costs  much  more  times  than  integral  calcula¬ 
tion. 

We  develop  a  preprocessing  procedure  to  increase  the  speed  of  resampling  in  raytracing. 
First,  voxelization  is  used  to  convert  CFD  grids  from  their  continuous  geometric  representation 
into  a  set  of  voxels^.  Then  voxels  are  distributed  into  regular  regions  according  to  the  voxels’  lo¬ 
cation.  The  regular  regions  created  by  equally  divide  the  whole  flow  field  along  x  and  y  coordi¬ 
nates.  At  last  the  voxels  in  different  regions  are  distributed  into  different  parts  of  the  parallel  dis¬ 
tributed-memory  supercomputers  ready  for  parallel  graphic  rendering.  After  the  preprocessing 
procedure,  resampling  is  fast  from  two  aspects.  One  is  that  it  can  be  parallely  processed.  The  other 
is  that  each  sub-region  has  fewer  voxels  for  searching. 

Our  parallel  algorithm  benefit  from  the  parallel  distributed-memory  supercomputers  for  the 
increasing  of  the  number  of  CPU  and  the  sum  of  total  memories.  Additionally,  because  every  line 
integral  process  needs  to  search  through  all  voxels  in  a  block,  when  we  make  more  blocks  with 
fewer  voxels  in  each  block  after  region  dividing,  the  total  searching  speed  is  improved  despite  of 
the  cost  of  the  voxels’  distributing  process.  This  make  the  performance  increased  greatly. 

Ideally,  data  should  be  distributed  in  such  a  way  that  every  processor  requires  the  same 
amount  of  storage  space  and  incurs  the  same  computational  load.  Load  imbalances  are  unavoid¬ 
able  because  voxels  come  in  different  sizes  and  shapes.  The  difference  in  size  can  be  as  large  as 
several  orders  of  magnitude  due  to  the  adaptive  nature  of  the  mesh.  Load  imbalances  are  reduced 
by  making  all  the  regions  have  approximately  the  same  number  of  voxels. 


3  Test  Results 

For  convenience,  we  have  used  two  examples  for  our  initial  experiments.  Each  example  has  a 
small  curvilinear  grid  dataset  containing  about  0.5  million  hexahedral  cells.  The  first  example 
represents  shock  wave  over  two  bodies  in  a  channel  bend.  The  surface  mesh  of  the  dataset  is  dis¬ 
played  in  Figure  1 .  The  first  body  is  a  cylinder-cone  with  the  base  diameter  of  22mm  and  height  of 
16mm,  and  the  second  one  is  the  square  pillar  with  base  size  of  16mmX  16mm  and  12mm  height. 
The  second  example  is  for  examining  the  hypersonic  dynamics  of  a  ball  shape  (12mm)  in  the  bal¬ 
listic  range.  We  plan  to  conduct  further  tests  using  a  larger  dataset  with  several  million  cells. 


We  implemented  our  volume  tenderer  in  the  C++  language  using  MPI  message  passing***  for 
interprocessor  communication.  All  tests  were  run  on  a  Beowulf  parallel  computing  system  built  by 
ourselves.  The  Beowulf  parallel  computing  system  is  a  distributed-memory  architecture  which 
employs  a  switch-based  processor  interconnect.  Our  Beowulf  parallel  computing  system  runs  on 
Redhat  linux  5.0  and  it  is  populated  with  9  nodes  based  on  800  MHz  Pentium  III  processor  and 
incorporate  a  1 00  Mb/s  ethemet  switch.  . 


Figure!  Comparison  of  experi¬ 
mental  and  computational  inter- 
ferograms 


Computational 

Figure  3  Comparison  of 

experimental  and  computational 
shadowgraphs 


Figure  2  is  a  comparison  of  experimental  and  computational  interferograms  of  the  first  ex¬ 
ample.  The  test  is  performed  at  Mach  number  M=o  of  1.51  and  wall  temperature  T„  of  285K.  CFD 
use  laminar  conservative  full  N-S  equation  with  TVD  scheme  in  convective  terms  and  with  center 
difference  in  viscous  terms.  The  two  steps  Range-Kutta  method  is  used  for  unsteady  flow  so  as  to 
obtain  the  second  order  accuracy  in  time  and  space. 

Figure  3  is  a  comparison  of  experimental  and  computational  shadowgraphs  of  the  second 
example.  The  Reynolds  number  based  on  model  diameter  (12mm)  for  this  test  is  near  10^.  The  test 
is  performed  at  Mach  number  of  15  and  wall  temperature  T„  of  300K.  Numerical  simulation 
is  based  on  full  unsteady  Navier-Stokes  equations. 

Table  1  shows  in  the  first  example  the  time  cost  comparison  of  one  processor  PC  and  the  nine 
processors  parallel  system. 


distributing  voxels 

Resampling 

Total  cost 

One  processor 

243  seconds 

2018  seconds 

2261  seconds 

Nine  processors 

31  seconds 

202  seconds 

233  seconds 

Table  1  Time  cost  comparison  of  one  processor  and  nine  processors  system 


1.  Conclusion 


By  combining  a  spatial  partitioning  scheme  with  techniques  which  devided  the  flow  field 
into  regular  regions,  we  have  produced  a  parallel  volume  renderer  for  Computational  Flow 
Imaging  which  employs  inexpensive  static  load  balancing  to  achieve  high  performance  and 
reasonable  efficiency  with  modest  memory  consumption.  We  believe  that  our  algorithm  has 
provide  an  effective  way  for  rendering  complex  grids  of  Computational  Flow  Imaging  on  dis¬ 
tributed-memory  message-passing  architectures.  Detailed  performance  experiments  with  up  to 
9  processors  lead  us  to  believe  that  further  improvements  are  possible. 

We  plan  to  conduct  additional  tests  with  larger  datasets,  different  image  sizes,  more 
processors,  and  other  architectures.  With  larger  datasets,  the  number  of  ray  segments  gener¬ 
ated  may  increase  significantly,  and  we  need  to  assess  the  impact  of  this  additional  commu¬ 
nication  load  on  overall  performance.  We  also  want  to  investigate  the  potential  for 
finer-grained  image  partitionings  and  improved  termination  strategies  to  improve  the  parallel 
efficiency  of  our  approach. 
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